Better fix of `timeout` activity tick after restart by jglick · Pull Request #234 · jenkinsci/workflow-basic-steps-plugin

jglick · 2022-08-29T16:59:57Z

#226 correctly diagnosed a problem brought to light by a flaky test, but the fix was not correct (only applied to messages printed within the controller as opposed to from an agent) and apparently introduced a severe performance regression.

I think the root issue was that restarting the controller ought to count as “activity” for purposes of delaying the timeout (better to err on the side of caution) but it did not.

Pldi23 · 2022-08-29T17:53:04Z

Thanks @jglick for quick reaction on this regression, just in case I've setup 300 executions run in test selector https://gauntlet-2.cloudbees.com/rosie/job/playground/job/flakebusters/job/selectors/job/pct-test-selector/168/

jglick · 2022-08-29T20:37:52Z

Seems it occasionally fails in

workflow-basic-steps-plugin/src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java

Line 233 in 85b6c33

j.assertLogContains("JustHere!", b);

   0.766 [restarted #1] Resuming build at Mon Aug 29 20:10:01 UTC 2022 after Jenkins restart
   0.767 [restarted #1] Waiting to resume part of restarted #1: Waiting for next available executor
   1.826 [restarted #1] Timeout set to expire after 2 sec without activity
   2.083 [restarted #1] Ready to run at Mon Aug 29 20:10:02 UTC 2022
   2.393 [restarted #1] [Pipeline] echo
   2.401 [id=63]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 0 ms
   2.401 [id=91]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 1.5 sec
   2.447 [restarted #1] NotHereYet
   2.598 [restarted #1] [Pipeline] sleep
   2.647 [id=63]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 0 ms
   2.647 [id=93]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 1.5 sec
   2.648 [restarted #1] Sleeping for 10 sec
   3.840 [id=85]	FINE	o.j.p.w.s.TimeoutStepExecution#resetTimer: resetting timer on 8de14d60-6560-4abb-9426-1128deb999e5
   3.903 [id=83]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 7.5 sec
   3.957 [restarted #1] Cancelling nested steps due to timeout

Unclear if this is a bug in production code or just in the test. In any case, I am inclined to ship this amendment anyway, at least if @chwehrli @tarioch can confirm it addresses the performance regression.

Pldi23 · 2022-08-30T05:32:41Z

Thanks @jglick for quick reaction on this regression, just in case I've setup 300 executions run in test selector https://gauntlet-2.cloudbees.com/rosie/job/playground/job/flakebusters/job/selectors/job/pct-test-selector/168/

TimeoutStepTest#activityRestart still flaky, failed 3 times in 300 executions.

chwehrli · 2022-08-30T12:24:00Z

This also looks good from our end with regards to performance issues we noticed with 226 - Jenkins has been running fine over several hours at regular load - thanks!

Pldi23 · 2022-08-30T12:26:14Z

Seems it occasionally fails in

workflow-basic-steps-plugin/src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java

Line 233 in 85b6c33

j.assertLogContains("JustHere!", b);

   0.766 [restarted #1] Resuming build at Mon Aug 29 20:10:01 UTC 2022 after Jenkins restart
   0.767 [restarted #1] Waiting to resume part of restarted #1: Waiting for next available executor
   1.826 [restarted #1] Timeout set to expire after 2 sec without activity
   2.083 [restarted #1] Ready to run at Mon Aug 29 20:10:02 UTC 2022
   2.393 [restarted #1] [Pipeline] echo
   2.401 [id=63]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 0 ms
   2.401 [id=91]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 1.5 sec
   2.447 [restarted #1] NotHereYet
   2.598 [restarted #1] [Pipeline] sleep
   2.647 [id=63]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 0 ms
   2.647 [id=93]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 1.5 sec
   2.648 [restarted #1] Sleeping for 10 sec
   3.840 [id=85]	FINE	o.j.p.w.s.TimeoutStepExecution#resetTimer: resetting timer on 8de14d60-6560-4abb-9426-1128deb999e5
   3.903 [id=83]	FINE	o.j.p.w.s.TimeoutStepExecution$Tick#schedule: scheduling tick for 7.5 sec
   3.957 [restarted #1] Cancelling nested steps due to timeout

Unclear if this is a bug in production code or just in the test. In any case, I am inclined to ship this amendment anyway, at least if @chwehrli @tarioch can confirm it addresses the performance regression.

Probably test changes are not needed? If I run ‘old’-versioned test against your fix the test does not flake.
https://gauntlet-2.cloudbees.com/rosie/job/playground/job/flakebusters/job/selectors/job/pct-test-selector/169/

jglick

Just noting the reason for test changes.

jglick · 2022-08-30T12:32:23Z

src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java


    @Rule public GitSampleRepoRule git = new GitSampleRepoRule();

+    @Rule public LoggerRule logging = new LoggerRule().record(TimeoutStepExecution.class, Level.FINE);


Just to help debug.

jglick · 2022-08-30T12:34:04Z

src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java

                        + "    sleep 10;\n"
                        + "    echo 'JustHere!';\n"
-                        + "    sleep 30;\n"
+                        + "    sleep 20;\n"


A pre-commit version the main patch contained a mistake—the build did not time out until 1½× the stated period; in this case sleep would run for ~23s before being terminated. I was trying to strengthen the test here to verify that the 15s activity timeout really applies.

jglick · 2022-08-30T12:34:21Z

src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java

                WorkflowRun b = p.scheduleBuild2(0).getStartCondition().get();
                SemaphoreStep.waitForStart("restarted/1", b);
        });
+        Thread.sleep(10_000); // restarting should count as activity


Making the test reliably reproduce the originally reported problem.

…omment)

jglick · 2022-08-30T12:37:56Z

If I run ‘old’-versioned test against your fix the test does not flake.

Like what I did in 512d920? OK, if that is what it takes to address the regression and prevent test flakes, then we can go with that. I just want to ship this promptly.

Better fix of timeout activity tick after restart

85b6c33

jglick requested a review from Pldi23 August 29, 2022 17:00

jglick added the bug label Aug 29, 2022

jglick mentioned this pull request Aug 29, 2022

Call ResetTimer synchronously from eol and skip the Tick when channel is null #226

Merged

jglick commented Aug 30, 2022

View reviewed changes

Reverting test changes since they seem to flake ~1%: jenkinsci#234 (c…

512d920

…omment)

Pldi23 approved these changes Aug 30, 2022

View reviewed changes

jglick enabled auto-merge August 30, 2022 12:51

jglick merged commit d57e3ca into jenkinsci:master Aug 30, 2022

jglick deleted the TimeoutStepExecution branch August 30, 2022 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better fix of `timeout` activity tick after restart#234

Better fix of `timeout` activity tick after restart#234
jglick merged 2 commits intojenkinsci:masterfrom
jglick:TimeoutStepExecution

jglick commented Aug 29, 2022

Uh oh!

Pldi23 commented Aug 29, 2022 •

edited

Loading

Uh oh!

jglick commented Aug 29, 2022

Uh oh!

Pldi23 commented Aug 30, 2022

Uh oh!

chwehrli commented Aug 30, 2022

Uh oh!

Pldi23 commented Aug 30, 2022

Uh oh!

jglick left a comment

Uh oh!

jglick Aug 30, 2022

Uh oh!

jglick Aug 30, 2022

Uh oh!

jglick Aug 30, 2022

Uh oh!

jglick commented Aug 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		@Rule public GitSampleRepoRule git = new GitSampleRepoRule();

		@Rule public LoggerRule logging = new LoggerRule().record(TimeoutStepExecution.class, Level.FINE);

Uh oh!

Conversation

jglick commented Aug 29, 2022

Uh oh!

Pldi23 commented Aug 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jglick commented Aug 29, 2022

Uh oh!

Pldi23 commented Aug 30, 2022

Uh oh!

chwehrli commented Aug 30, 2022

Uh oh!

Pldi23 commented Aug 30, 2022

Uh oh!

jglick left a comment

Choose a reason for hiding this comment

Uh oh!

jglick Aug 30, 2022

Choose a reason for hiding this comment

Uh oh!

jglick Aug 30, 2022

Choose a reason for hiding this comment

Uh oh!

jglick Aug 30, 2022

Choose a reason for hiding this comment

Uh oh!

jglick commented Aug 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pldi23 commented Aug 29, 2022 •

edited

Loading